#message=FALSE means some output messages aren't printed in the knitted HTML document

knitr::opts_chunk$set(echo = FALSE, warning=FALSE, message=FALSE) 
#message=FALSE, echo=FALSE and warning=FALSE ensure any error messages aren't shown in the output HTML file

#message=FALSE means no output messages are printed on the knitted HMTL file
#loading any necessary packages
library(dplyr)
library(tidyverse) 
library(knitr)
library(readr)
library(ggplot2)
library(bbmle)
library(mizer) 
library(Hmisc)
library(lme4)
library(RColorBrewer)
library(viridis)
library(hexbin)
library(ggpattern) 

1 Introduction

This data set is formed from recordings taken by multiple ships between the years 1886-2016.

Fish were taken from some location, and their individual stomach contents recorded. Predators of the same species and (roughly) the same weight/size were recorded as a single data point (the number of predators for each data set is called ‘n_stomachs’). Prey of (roughly) the same size found in these stomachs were recorded in this same data point, and the total number of prey in some data point is called ‘prey_count’. For each individual data point,

\[ \text{no. prey per stomach} = \frac{\text{no. of stomachs sampled}}{\text{total no. of prey}} \] This is an approximation of the number of prey per stomach for some specific predator, and is called ‘no._prey_per_stmch’.

Using similar logic,‘prey_weight_g’ is the total weight of prey in the multiple stomachs sampled to make up one datapoint. ‘indiv_prey_weight’ is then the weight of one single prey individual found in the stomach

There are some other column names which are useful to note:

  • ices_rectangle accounts for location of points, where each area the points are sampled in is split up into a number of smaller rectangles and each is given a unique name
  • ppmr stands for ‘predator prey mass ratio’, and is calculated using:

\[ \text{PPMR} = \frac{\text{predator mass}}{\text{prey mass}} \] (where the prey mass is the average mass of prey of a certain type found in a single predator stomach) * haul_id is a identification tag given to each individual ship which recorded observations

2 Distribution of prey type eaten for each predator

These are individual bar charts showing the distribution of the type of prey each predator eats.

The prey types are:

  • Benthos: organisms that live on/near the bottom of a body of water
  • Fish: aquatic, craniate (have a skull), gill-bearing animals that lack limbs with digits (i.e. their limbs don’t have toes or fingers). Fish are a type of nekton, but for this data set fish are recorded as separate entities.
  • Nekton: the actively swimming aquatic organisms that can resist a strong current of water (the classification of nekton for this data set excludes any fish)
  • Other: miscellaneous types of aquatic animals that don’t fit into the other categories
  • Zooplankton: animal types of plankton (where plankton are aquatic organisms that are unable to swim effectively against currents)

We can see that some predators eat a range of prey types (e.g. Whiting), while others prefer to eat a single prey type in abundance (e.g. European Hake and Monkfish mostly consume fish as their prey).

3 Distribution of prey masses

This plot shows that different predator species consume differently sized distributions of prey. For example, some species prefer to eat a large quantity of a single size of prey, such as:

  • Horse Mackerel has a distinct peak at \(\log(\text{prey mass}) = -6.8\)
  • Norway Pout has one at \(log(\text{prey mass}) = -6.7\)

And some species show a wider range of prey masses which they consume, such as:

  • Cod
  • Sprat, and
  • Megrim

4 Most common prey weights

This graph is looking at the distribution of the weight of prey recorded, i.e. looking at what is the most common prey weight over all prey species. It is a graph over all the data points, so includes recordings from all the ships involved.

There are some potentially interesting results, so results from different ships were plotted on individual curves to further look into these results.

Taking a closer look at some of these ships gives the plots below:

These six graphs show a range of unusual looking points:

  1. \(y \propto e^{-x}\) relation for END04 (i.e. no. per stomach is proportional to 1/prey weight)
  2. lots of observations for single prey weights for LUC and CLYDE (554)
  3. lots of the same no. of prey per stomach observations for HIDDINK.(27180)
  4. the EXCmacDATSTO815 and Excmacdatsto815error seem to represent exactly the same data

There are a number of other ships which also provided potentially erroneous data, and the number of datapoints affected are as below:

Graph_Type Number_of_observations
y proportional to exp(-x) 2093
Single prey weight 554
Same number of prey per stomach 26854
Same data 5618

In total, 35445 out of the total 267431 observations lie within these potentially erroneous groups. This is: \[ \frac{35445}{267431} \times 100 = 13.25389 \% \]

This means that approximately \(13 \%\) of the data set comes from potentially unreliable sources. Though we will not remove or alter the data set to account for these potentially erroneous data points in this project, for further analysis it may be sensible to remove the data points these six ships recorded so that any results are as reliable as possible.

5 Prey and predator weight relation

We are attempting to find a link between the predator weight and the prey weight, and \(\log()\) of each axis is used to see the proportionality of the axes. The mathematics behind this is as follows:

\[ \log (\text{prey weight}) = m \times \text{log(predator weight)} + c , \]

where m is the gradient of the slope and c is the y-intercept (thinking of this graph as a linear model).

Taking the exponential of both sides,

\[ \text{prey weight} = \exp({{m}\times{\log(\text{predator weight})}}) + \log(D) , \] where \(\log(D) = c\).

Finally,

\[ \text{prey weight} = \text{predator weight}^m \times D \]

Hence, we want \(m = 1\) to show that predator weight is linearly proportional to prey weight (as expected). However, to simply show that the two variables are dependent on each other, we are looking for instances where \(m \neq 0\)

## Slope of the log(prey weight) v. log(predator weight) line of best fit: 0.6712536

For this graph, the slope is not equal to one. However, this graph is plotted over the entire data set and includes values from all of the available predator species.

Instead, separate the plots by predator species to see if there is some proportionality between the predator and prey weight for each specific species.

5.1 Prey v. predator weight plot, separated by predator species

5.2 Gradients of the plots for each predator species

Predator_Species Gradient
Blue Whiting -0.2433567
Cod 0.7269038
Common Dab 0.6359695
European Hake 0.7677213
Haddock 0.7004115
Herring 0.1507461
Horse Mackerel 0.5960227
Mackerel 0.2379693
Megrim 1.1524540
Monkfish 0.8168852
Norway Pout 0.9604312
Plaice 0.7014262
Poor Cod 1.0845397
Sole 0.3803803
Sprat 0.5874927
Whiting 0.6945304

Here, we are splitting up the graphs over each individual predator species to look and see if there is a proportionality between the predator and prey masses for each specific predator species.

We are wanting gradient \(\neq 0\) to suggest dependence between the predator and prey mass, and specifically gradient \(= 1\) to suggest that they are linearly proportional.

The gradient \(\neq 0 \pm 0.5\) for the predator species:

  • Cod
  • Common Dab
  • European Hake
  • Haddock
  • Horse Mackerel
  • Megrim
  • Monkfish
  • Norway Pout
  • Plaice
  • Poor Cod
  • Sprat
  • Whiting

This represents 13 out of the 16 possible predator species, hence the assumption that the predator and prey masses are dependent on each other is a reasonable one. Interestingly, these gradients are all also equal to \(1 \pm 0.5\), which suggests that there is a linear proportionality between these variables.

Having proportionality between the predator and prey masses allows us to use the PPMR (predator prey mass ratio) for further analysis in this project.

6 PPMR for individual species

Here, we are looking for the most common log(PPMR) for each individual species. This will be done by plotting log(PPMR) against the density of points, and weighting observations on different variables. By assuming these are normally distributed relations, there should be a ‘peak’ point of log(ppmr) which is where the mean/most common log(ppmr) value lies. We assume this will be different for different species of predator, i.e. each predator has a preferred ppmr value and hence each predator type has a preferred relative size of prey.

6.1 Weighted by prey biomass

These plots are weighted by the prey biomass. This means that we are looking at the distribution of prey biomass across values of log(PPMR), so points with a larger biomass are prioritised/weighted more than those with a smaller biomass.

6.2 Weighted by number of prey

These plots are weighted by the number pf prey per stomach. This means that data points with a larger number of prey per stomach are prioritised/weighted more than those with a smaller number of prey per stomach.

By looking at the two differently weighted graphs, it is clear to see that the plots are ‘shifted’ by some amount (e.g. the Blue Whiting when weighted by prey biomass has a mean log(ppmr) of roughly 4.8, and when weighted by number of prey this mean becomes roughly 7.5).

7 Specific PPMR calculations by different weightings for Herring species

Here, we take only observations relating to the predator type Herring. This allows us to do more specific analysis about how density observations can be weighted, and explain why the weighted means are shifted by some amount.

7.1 Weighted by prey biomass

The curve with points weighted by the prey biomass is plotted in blue, and a normal curve (also weighted by prey biomasss) is plotted over the top as a dashed black line.

## Mean value of log(ppmr), weighted by biomass of prey: 6.781942 
##  Standard deviation of this: 2.370106

7.2 Weighted by number of prey

The curve with points weighted by the number of prey per stomach is plotted in red, and a normal curve (also weighted by the number of prey per stomach) is plotted over the top as a dotted black line.

## Mean value of log(ppmr), weighted by the number of prey per stomach: 13.68691 
##  Standard deviation of this: 2.473435

7.3 Combined graph

This is a graph with both the the distribution of Herring log(PPMR) as weighted by prey biomass and number of prey plotted over each other (along with appropriately weighted normal distribution curves plotted over the top of each).

  • Prey biomass weighted mean: 6.629177
  • Number of prey weighted mean: 13.54102

and

  • Prey biomass standard deviation: 2.3508789268221
  • Number of prey standard deviation: 2.63330496940788

The mean is shifted by: \(13.54102 - 6.629177 = 6.911843\).

The mathematics of shifted means for this difference in weighting is:

\[ \text{weighted mean}_{\text{expected prey biomass}} = \text{weighted mean}_{\text{no. of prey }} - (\text{standard deviation}_{\text{no. of prey }})^2 \]

Hence, the expected result is:

\[ \text{weighted mean}_{\text{expected prey biomass}} = 13.54102 - (2.63330496940788)^2 = 6.60672493809 \] RECALCULATE

There is a difference in expected and actual prey biomass weighted mean of 0.30511806191. This is a fairly insignificant value, which suggests the shifting mean equations are relatively accurate for this data set with these weightings.

Therefore, we can continue with the assumption that there is a difference in the mean of log(ppmr) of the square of the standard deviation when we weight by number of prey per stomach and prey biomass.

8 Predator mass against ppmr

8.1 Plotting ppmr against predator mass, separated by predator species

Graphing \(\log\)(predator mass) v. \(\log\)(PPMR) for individual predator species to see if the predator mass related to the ppmr.

\[ \log(\text{PPMR}) = m \times \log (\text{predator mass}) + c \]

where m is the gradient (assuming a linear model between the variables) and c is the y-intercept. Then,

\[ \text{PPMR} = (\text{predator mass})^m + D \]

(where \(c = \log(D)\))

We want them to not be proportional (i.e. slope, m= 0) to prove that log(predator mass) is not related to the log(ppmr). This would mean that the ppmr is independent of the predator mass for a certain species (as assumed).

8.2 Gradients of the plots

Predator_species gradient
Blue Whiting 1.2433567
Cod 0.2730962
Common Dab 0.3640305
European Hake 0.2322787
Haddock 0.2995885
Herring 0.8492539
Horse Mackerel 0.4039773
Mackerel 0.7620307
Megrim -0.1524540
Monkfish 0.1831148
Norway Pout 0.0395688
Plaice 0.2985738
Poor Cod -0.0845397
Sole 0.6196197
Sprat 0.4125073
Whiting 0.3054696

For the assumption to be upheld, we need gradient \(= 0\). This is (approximately, by \(\pm 0.5\)) satisfied for the species:

  • Cod
  • Common Dab
  • European Hake
  • Haddock
  • Horse Mackerel
  • Megrim
  • Monkfish
  • Norway Pout
  • Plaice
  • Poor Cod
  • Sprat
  • Whiting

This means that the gradient of 12 out of the possible 16 species is equal to \(0 \pm 0.5\). However, this amount of error is fairly large, so we have not supported the idea that the assumption of independence between the \(\log(\text{ppmr})\) and \(\log(\text{predator mass})\) can be upheld for this data set.

8.3 Different weightings of the line of best fit

## slope of the log(ppmr) v. log(predator mass) line of best fit for Poor Cod: 
##  i) Weighted by number of prey in the stomach: -1.652398 
##  ii) Weighted by prey biomass: -0.001483607 
##  iii) No weighting in the calculation: -0.08453972

Looking at just the predator species ‘Poor Cod’. There are three LOBF (line of best fit), weighted by different variables:

  1. green and dotted: by number of prey per stomach
  2. red and dashed: by prey biomass
  3. blue: no weighting in the LOBF

When weighted by prey biomass, the gradient is equal to zero for 2 s.f.. Hence, the approximation (of slope $ = 0$) is supported by the data when looking at a prey biomass weighting.

However, given that this slope is only equal to \(0\) for one of these three weightings, the above linear models in fact show that the \(\log(\text{predator mass})\) and the \(\log(\text{PPMR})\) are dependent on one another.

8.4 Residuals of the log(predator mass) against log(ppmr) plot

We are now proceeding with the idea that the \(\log(\text{predator mass})\) and \(\log(\text{PPMR})\) are dependent on one another in some way, hence the predator mass is dependent of the PPMR.

Heteroscedasticity is when the residuals (error terms) are unequally scattered. For example, residuals may get more ‘spread out’ as the fitted values get larger. This occurs in data sets where there is a large range of data values, and is a problem because we want residuals with a constant variance. This is seen when the residuals are approximately normally distributed, and hence if residuals do not show a normal distribution then it is unclear whether or not the method used to infer the parameters of the linear relationship is a good one.

8.4.1 With residuals of each point to the predicted values of each point

This plot shows how distributed the points are in comparison to the line of best fit. For a ‘well fitting’ line of best fit, we would expect the residuals to be approximately normally distributed. As discussed earlier, the residual of a point is equal to: \[ \text{Residual} = \text{actual value} - \text{predicted/fitted value} , \] where the predicted/fitted value is equal to the value as predicted by the line of best fit.

8.4.2 Residuals v. fitted values

This graph plots the residual of each observation against its fitted value (as predicted by the line of best fit). This plot is used to detect unequal residual variances and any outliers. We can see that the residuals generally increase as the fitted values increase, which suggests that the residuals are potentially not normally distributed. To look further into this possibility, we will next plot a qqplot.

8.4.3 qqplot

A qqplot (quantile-quantile plot) shows if two data sets come from the same distribution. For this data set, we plot the actual residuals against the expected residuals if the residuals were normally distributed. The points on this graph should fall on the straight line (at a 45 degree angle), and if they don’t then the residuals do not appear to be normally distributed.

8.4.4 Density of residuals

We can see that for the predator species Herring, the residuals do not look like they are normally distributed. This suggests that the residuals are unequally scattered, and hence we continue with the idea that \(\log{(\text{ppmr})}\) and \(\log{(\text{predator mass})}\) are independent of each other.

8.5 Residual plots for the individual predator species

This is a histogram plot of the residuals of the plot, separated for each predator species. As discussed earlier, a normal distribution of the density of residuals would suggest that the data fits a linear model. This would then infer that the axes would be related to one another.

However, none of the predator species produce a density plot which looks convincingly normally distributed. Therefore, we can say that this data suggest that the PPMR and predator mass are independent of one another when modelling using a linear model.

Instead use a different model (a mixed effects model) for this data set to attempt to more accurately approximate certain parameters in the variable relationships.

9 Introducing a mixed effects model

We will introduce a mixed effects model to account for some variance that external effects may cause to the data.

9.1 Building up the model

9.1.1 Basic Linear model

This is mathematically represented by: \[ \log(\text{ppmr})_i = \beta_1 \times \log(\text{predator mass})_i + \beta_0 \]

Where:

  • \(i\) is the individual observation
  • \(\beta_{0}\) is the overall intercept term (defined at the point where \(\log(\text{predator mass}) = 0\).)
  • \(\beta_{1}\) is the slope term that’s dependent on the fixed effect (the predator mass)

This plot is very difficult to interpret, and the relationship between the \(\log(\text{ppmr})\) and \(\log(\text{predator mass})\) is difficult to visualise as there is not a single gradient for each predator species. To aid in our interpretation, we will add random effects to account for some of the variance in the \(\log(\text{ppmr})\) values.

9.1.2 First trial

Fixed effects: \(\log\)(predator mass)

Random effects: predator species (fixed slopes and random intercepts)

(Intercept) lpred_weight
Blue Whiting 6.432030 0.3000052
Cod 4.343735 0.3000052
Common Dab 4.364466 0.3000052
European Hake 2.617691 0.3000052
Haddock 6.713380 0.3000052
Herring 8.444808 0.3000052
Horse Mackerel 7.722993 0.3000052
Mackerel 7.869912 0.3000052
Megrim 3.370159 0.3000052
Monkfish 2.323265 0.3000052
Norway Pout 5.981170 0.3000052
Plaice 5.448001 0.3000052
Poor Cod 4.840859 0.3000052
Sole 5.579954 0.3000052
Sprat 4.395767 0.3000052
Whiting 4.534865 0.3000052

Here, there is the same slope for every predator species, but each species group has a different y-intercept (i.e. the line for each predator species group crosses the \(\log\)(predator mass)=0 line at a different value for \(\log\)(PPMR).

This is mathematically represented by:

\[ \log(\text{ppmr})_{ij} = (\beta_{0} + \alpha_{0j}) + \beta_{1} \log(\text{predator mass})_{i} + \epsilon_{ij} \]

Where:

  • \(i\) is the individual observation
  • \(j\) is the predator species
  • \(\beta_{0}\) is the overall intercept term
  • \(\alpha_{0j}\) is the random intercept term for predator species \(j\) (where \(\alpha_{0j}\) is normally distributed with a mean of \(0\) and a standard deviation \(\omega_{A[00]}\) (i.e. \(\alpha_{0j} \sim (0, \omega^2_{A[00]})\))).
  • \(\beta_{1}\) is the slope term that’s dependent on the fixed effect (the predator mass)
  • \(\epsilon_{ij}\) is the error term, which describes the residuals of the model

9.1.3 Second trial

Fixed effects: \(\log\)(predator mass)

Random effects: predator species (random slope and fixed intercept)

(Intercept) lpred_weight
Blue Whiting 4.73856 0.7482483
Cod 4.73856 0.2408170
Common Dab 4.73856 0.2186137
European Hake 4.73856 -0.0489319
Haddock 4.73856 0.6685136
Herring 4.73856 1.0858146
Horse Mackerel 4.73856 0.8435213
Mackerel 4.73856 0.8386599
Megrim 4.73856 0.0249256
Monkfish 4.73856 -0.0386645
Norway Pout 4.73856 0.5280212
Plaice 4.73856 0.4320842
Poor Cod 4.73856 0.3121562
Sole 4.73856 0.4969605
Sprat 4.73856 0.1496522
Whiting 4.73856 0.2639261

This is mathematically represented by:

\[ \log(\text{ppmr})_{ij} = \beta_{0} + (\beta_{1} + \alpha_{1j}) \log(\text{predator mass})_{i} + \epsilon_{ij} \]

Where:

  • \(i\) is the individual observation
  • \(j\) is the predator species
  • \(\beta_{0}\) is the overall intercept term
  • \(\beta_{1}\) is the slope term that’s dependent on the fixed effect (the predator mass)
  • \(\alpha_{1j}\) is the slope term that’s dependent on the random effect (the predator species) (where \(\alpha_{1j}\) is normally distributed with a mean of \(0\) and a standard deviation \(\omega_{A[10]}\) (i.e. \(\alpha_{1j} \sim (0, \omega^2_{A[10]})\))).
  • \(\epsilon_{ij}\) is the error term, which describes the residuals of the model

For this plot, the value of log(predator mass) is modelled to be dependent on the predator species. This allows us to have lines for each predator group which all have the same intercept but differing slopes.

9.1.4 Third trial

Fixed effects: \(\log\)(predator mass)

Random effects: predator species (random slope and random intercept)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: lppmr ~ lpred_weight + (1 + lpred_weight | pred_species)
##    Data: renamed_df
## 
##       AIC       BIC    logLik  deviance  df.resid 
## 1106110.6 1106173.3 -553049.3 1106098.6    254589 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3221 -0.6579 -0.1165  0.6443  6.6266 
## 
## Random effects:
##  Groups       Name         Variance Std.Dev. Corr 
##  pred_species (Intercept)  1.6843   1.2978        
##               lpred_weight 0.1086   0.3296   -0.28
##  Residual                  4.5083   2.1233        
## Number of obs: 254595, groups:  pred_species, 16
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)   4.96998    0.33067  15.030
## lpred_weight  0.37569    0.08379   4.484
## 
## Correlation of Fixed Effects:
##             (Intr)
## lpred_weght -0.307

(Intercept) lpred_weight
Blue Whiting 2.940214 1.1799729
Cod 4.503184 0.2730910
Common Dab 4.105908 0.3587418
European Hake 3.067099 0.2218485
Haddock 6.714389 0.2997893
Herring 6.072514 0.8489858
Horse Mackerel 7.148247 0.4111815
Mackerel 5.194771 0.7616553
Megrim 5.544190 -0.1283793
Monkfish 3.246866 0.1670623
Norway Pout 6.519591 0.0479265
Plaice 5.452254 0.2990344
Poor Cod 5.985208 -0.0239622
Sole 4.306496 0.5845767
Sprat 4.201450 0.4039674
Whiting 4.517309 0.3054689

This is mathematically represented by: \[ \log(\text{ppmr})_{ij} = (\beta_{0} + \alpha_{0j}) + (\beta_{1} + \alpha_{1j}) \log(\text{predator mass})_{i} + \epsilon_{ij} \]

Where:

  • \(i\) is the individual observation
  • \(j\) is the predator species
  • \(\beta_{0}\) is the overall intercept term
  • \(\alpha_{0j}\) is the random intercept term for predator species \(j\) (where \(\alpha_{0j}\) is normally distributed with a mean of \(0\) and a standard deviation \(\omega_{A[00]}\) (i.e. \(\alpha_{0j} \sim (0, \omega^2_{A[00]})\))).
  • \(\beta_{1}\) is the slope term that’s dependent on the fixed effect (the predator mass)
  • \(\alpha_{1j}\) is the slope term that’s dependent on the random effect (the predator species)
  • \(\epsilon_{ij}\) is the error term, which describes the residuals of the model

From the table, we can see this model features slopes and intercepts that are different for each predator group are different.

9.1.5 Comparing the models

## Data: renamed_df
## Models:
## one: lppmr ~ lpred_weight + (1 | pred_species)
## two: lppmr ~ lpred_weight + (0 + lpred_weight | pred_species)
## three: lppmr ~ lpred_weight + (1 + lpred_weight | pred_species)
##       npar     AIC     BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## one      4 1108372 1108414 -554182  1108364                         
## two      4 1110844 1110886 -555418  1110836    0.0  0               
## three    6 1106111 1106173 -553049  1106099 4737.5  2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

9.2 Trialing different models - only for Herring

Here, all the random effects are modeled with a randomly distributed slope and intercept.

9.2.1 Trial a

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: lppmr ~ lpred_weight + (1 + lpred_weight | haul_id_short)
##    Data: herring_df
## 
##      AIC      BIC   logLik deviance df.resid 
##  13141.8  13177.6  -6564.9  13129.8     2907 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1997 -0.4304  0.0223  0.4733  3.5513 
## 
## Random effects:
##  Groups        Name         Variance Std.Dev. Corr 
##  haul_id_short (Intercept)  10.34399 3.2162        
##                lpred_weight  0.05564 0.2359   -1.00
##  Residual                    5.21371 2.2834        
## Number of obs: 2913, groups:  haul_id_short, 13
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)   6.18076    0.95750   6.455
## lpred_weight  0.55528    0.08691   6.389
## 
## Correlation of Fixed Effects:
##             (Intr)
## lpred_weght -0.894
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

This is mathematically represented by:

\[ \log(\text{ppmr})_{ih} = (\beta_{0} + H_{0h}) + (\beta_{1} + H_{1h}) \log(\text{predator mass})_{1i} + \epsilon_{ih} \]

Where:

  • \(i\) is the individual observation
  • \(h\) is the different haul_id name
  • \(\beta_{0}\) is the overall intercept term
  • \(H_{0h}\) is the random intercept term for haul_id \(h\) (where \(H_{0h}\) is normally distributed with a mean of \(0\) and a standard deviation \(\omega_{H[00]}\) (i.e. \(H_{0h} \sim (0, \omega^2_{H[00]})\))).
  • \(\beta_{1}\) is the slope term that’s dependent on the fixed effect (the predator mass)
  • \(H_{1h}\) is the slope term that’s dependent on the random effect (the haul_id)
  • \(\epsilon_{ih}\) is the error term, which describes the residuals of each point

And:

  • Fixed effects: \(\log\)(predator mass)
  • Random effects: ship name

9.2.2 Trial b

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: lppmr ~ lpred_weight + (1 + lpred_weight | haul_id_short) + (1 +  
##     lpred_weight | year)
##    Data: herring_df
## 
##      AIC      BIC   logLik deviance df.resid 
##  13103.2  13157.0  -6542.6  13085.2     2904 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.5000 -0.4584 -0.0002  0.5399  3.5830 
## 
## Random effects:
##  Groups        Name         Variance Std.Dev. Corr 
##  year          (Intercept)  0.386395 0.62161       
##                lpred_weight 0.009257 0.09621  -0.47
##  haul_id_short (Intercept)  7.653812 2.76655       
##                lpred_weight 0.026492 0.16276  -1.00
##  Residual                   5.108200 2.26013       
## Number of obs: 2913, groups:  year, 22; haul_id_short, 13
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)   5.80577    0.85977   6.753
## lpred_weight  0.66658    0.08799   7.576
## 
## Correlation of Fixed Effects:
##             (Intr)
## lpred_weght -0.747
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Mathematical representation \[ \log(\text{ppmr})_{ihy} = (\beta_{0} + H_{0h} + Y_{0y}) + (\beta_{1} + H_{1h} + Y_{1y}) \log(\text{predator mass})_{1i} + \epsilon_{ihyr} \]

Where:

  • \(y\) is the different year the observation was taken
  • \(Y_{0y}\) is the random intercept term for haul_id \(h\) (where \(Y_{0y}\) is normally distributed with a mean of \(0\) and a standard deviation \(\omega_{Y[00]}\) (i.e. \(Y_{0y} \sim (0, \omega^2_{Y[00]})\))).
  • \(Y_{1y}\) is the slope term that’s dependent on the random effect (the haul_id)

And:

  • Fixed effects: \(\log\)(predator mass)
  • Random effects: ship name, year

9.2.3 Trial c

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: lppmr ~ lpred_weight + (1 + lpred_weight | haul_id_short) + (1 +  
##     lpred_weight | year) + (1 + lpred_weight | ices_rectangle)
##    Data: herring_df
## 
##      AIC      BIC   logLik deviance df.resid 
##  13035.0  13106.8  -6505.5  13011.0     2901 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2280 -0.4625  0.0352  0.5052  3.8013 
## 
## Random effects:
##  Groups         Name         Variance Std.Dev. Corr 
##  ices_rectangle (Intercept)  2.531025 1.59092       
##                 lpred_weight 0.034443 0.18559  -1.00
##  year           (Intercept)  0.086118 0.29346       
##                 lpred_weight 0.010707 0.10347  1.00 
##  haul_id_short  (Intercept)  5.751788 2.39829       
##                 lpred_weight 0.007871 0.08872  -1.00
##  Residual                    4.828733 2.19744       
## Number of obs: 2913, groups:  ices_rectangle, 152; year, 22; haul_id_short, 13
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)   5.67779    0.82585   6.875
## lpred_weight  0.64541    0.09742   6.625
## 
## Correlation of Fixed Effects:
##             (Intr)
## lpred_weght -0.624
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Mathematical representation: \[ \log(\text{ppmr})_{ihyr} = (\beta_{0} + H_{0h} + Y_{0y} + R_{0r} )+ (\beta_{1} + H_{1h} + Y_{1y} + R_{1r}) \log(\text{predator mass})_{1i} + \epsilon_{ihyr} \]

Where:

  • \(r\) is the different ices_rectangle term
  • \(R_{0r}\) is the random intercept term for the ices_rectangle \(r\) (where \(R_{0r}\) is normally distributed with a mean of \(0\) and a standard deviation \(\omega_{R[00]}\) (i.e. \(R_{0r} \sim (0, \omega^2_{R[00]})\))).
  • \(R_{1r}\) is the slope term that’s dependent on the random effect (the haul_id)

And:

  • Fixed effects: \(\log\)(predator mass)
  • Random effects: ship name, year, ices_rectangle

9.2.4 Comparing

Variance of the groups decreases as other random effects are added to the models. But, the error term (standard error) roughly increases as more random effects are added to the model.

9.3 Mixed effects model - over all species

Model c appears to represent the data the most effectively. We now want to consider a mixed effects model over all the predator species (not just Herring). Therefore, we choose to continue with model c (with random effects:ship name, year, ices_rectangle), but we also introduce a fourth random effect of predator species.

Mathematical representation: \[ \log(\text{ppmr})_{ihyrp} = (\beta_{0} + H_{0h} + Y_{0y} + R_{0r} + P_{or}) + (\beta_{1} + H_{1h} + Y_{1y} + R_{1r} + P_{1r}) \log(\text{predator mass})_i + \epsilon_{ihyrp} \] Where:

  • \(i\) is the individual observation
  • \(h\) is the different ship name
  • \(y\) is the the year each observation is taken
  • \(r\) is the different ices_rectangle term
  • \(p\) is the predator species
  • \(\beta_{0}\) is the overall intercept term
  • \(H_{0h}\) is the random intercept term for haul_id, \(h\)
  • \(Y_{0y}\) is the random intercept term for year, \(y\)
  • \(R_{0r}\) is the random intercept term for ices_rectangle, \(r\)
  • \(P_{0p}\) is the random intercept term for the predator species, \(p\)
  • \(\beta_{1}\) is the slope term that’s dependent on the fixed effect (the predator mass)
  • \(H_{1h}\) is the slope term that’s dependent on the haul_id
  • \(Y_{1y}\) is the slope term that’s dependent on the year
  • \(R_{1r}\) is the slope term that’s dependent on the ices_rectangle
  • \(P_{1p}\) is the slope term that’s dependent on the predator species
  • \(\epsilon_{ihyr}\) is the error term, which describes the residuals of each point

(where each random intercept term is normally distributed with a mean of \(0\) and a standard deviation \(\omega_{[00]}\) (i.e. \(sim (0, \omega^2_{[00]})\))).

And:

  • Fixed effects: \(\log\)(predator mass)
  • Random effects: ship name, year, ices_rectangle, predator species
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: lppmr ~ lpred_weight + (1 + lpred_weight | haul_id_short) + (1 +  
##     lpred_weight | year) + (1 + lpred_weight | ices_rectangle) +  
##     (1 + lpred_weight | pred_species)
##    Data: renamed_df
## 
##       AIC       BIC    logLik  deviance  df.resid 
## 1050649.5 1050806.2 -525309.8 1050619.5    254580 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.7029 -0.6066 -0.0387  0.5777  6.7472 
## 
## Random effects:
##  Groups         Name         Variance Std.Dev. Corr 
##  ices_rectangle (Intercept)  0.74808  0.8649        
##                 lpred_weight 0.02243  0.1498   -0.77
##  year           (Intercept)  1.01486  1.0074        
##                 lpred_weight 0.01590  0.1261   -0.84
##  haul_id_short  (Intercept)  2.79267  1.6711        
##                 lpred_weight 0.04027  0.2007   -0.47
##  pred_species   (Intercept)  0.60683  0.7790        
##                 lpred_weight 0.02887  0.1699   -0.29
##  Residual                    3.58532  1.8935        
## Number of obs: 254595, groups:  
## ices_rectangle, 718; year, 97; haul_id_short, 97; pred_species, 16
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)   5.12662    0.31570  16.239
## lpred_weight  0.28973    0.05613   5.161
## 
## Correlation of Fixed Effects:
##             (Intr)
## lpred_weght -0.489
## optimizer (nloptwrap) convergence code: 0 (OK)
## Model failed to converge with max|grad| = 0.0300338 (tol = 0.002, component 1)

This model has the following variables:

Variable name Meaning Value
\(\beta_{0}\) the intercept term 4.95089
\(\beta_{1}\) the fixed effect (slope) for \(\log(\text{predator mass})\) 0.30651
\(\omega_{H[00]}\) s.d. for the intercept term \(H_{0h}\), dependent on the haul_id_short 1.9026
\(\omega_{H[10]}\) s.d. for the slope term \(H_{1h}\), dependent on the haul_id_short 0.3481
\(\omega_{Y[00]}\) s.d. for the intercept term \(Y_{0y}\), dependent on the year 0.9498
\(\omega_{Y[10]}\) s.d. for the slope term \(Y_{1y}\), dependent on the year 0.1553
\(\omega_{R[00]}\) s.d. for the intercept term \(R_{0r}\), dependent on the ices_rectangle 0.9367
\(\omega_{R[10]}\) s.d. for the slope term \(R_{1r}\), dependent on the ices_rectangle 0.1736
\(\epsilon_{ihyr}\) the s.d. of the residual/error term 1.9024

9.4 Separated by predator species plots

Separating the plots out over the predator species provides the following information:

9.4.1 log(PPMR) v. log(predator mass) plot

9.4.2 Slope and intercept values

(Intercept) lpred_weight
Blue Whiting 5.336635 0.2206427
Cod 5.400622 0.1684724
Common Dab 4.283152 0.3446998
European Hake 4.499211 0.1022893
Haddock 5.458341 0.2877951
Herring 5.253578 0.4723981
Horse Mackerel 5.995314 0.3393641
Mackerel 5.531171 0.4232982
Megrim 5.992592 -0.0541812
Monkfish 3.416616 0.2045385
Norway Pout 6.163284 0.2500984
Plaice 4.314475 0.5501514
Poor Cod 5.094767 0.2727697
Sole 4.486150 0.5155735
Sprat 5.562778 0.4107018
Whiting 5.237223 0.1270132

9.4.3 Density of residuals